alignment error
Accurate and Efficient Low-Rank Model Merging in Core Space
In this paper, we address the challenges associated with merging low-rank adaptations of large neural networks. With the rise of parameter-efficient adaptation techniques, such as Low-Rank Adaptation (LoRA), model fine-tuning has become more accessible. While fine-tuning models with LoRA is highly efficient, existing merging methods often sacrifice this efficiency by merging fully-sized weight matrices. We propose the Core Space merging framework, which enables the merging of LoRA-adapted models within a common alignment basis, thereby preserving the efficiency of low-rank adaptation while substantially improving accuracy across tasks. We further provide a formal proof that projection into Core Space ensures no loss of information and provide a complexity analysis showing the efficiency gains. Extensive empirical results demonstrate that Core Space significantly improves existing merging techniques and achieves state-of-the-art results on both vision and language tasks while utilizing a fraction of the computational resources.
A robust inlier identification algorithm for point cloud registration via \mathbf{\ell_0} -minimization
Correspondences in point cloud registration are prone to outliers, significantly reducing registration accuracy and highlighting the need for precise inlier identification. In this paper, we propose a robust inlier identification algorithm for point cloud registration by reformulating the conventional registration problem as an alignment error $\ell_0$-minimization problem. The $\ell_0$-minimization problem is formulated for each local set, where those local sets are built on a compatibility graph of input correspondences. To resolve the $\ell_0$-minimization, we develop a novel two-stage decoupling strategy, which first decouples the alignment error into a rotation fitting error and a translation fitting error. Second, null-space matrices are employed to decouple inlier identification from the estimation of rotation and translation respectively, thereby applying Bayesian theory to $\ell_0$-minimization problems and solving for fitting errors. Correspondences with the smallest errors are identified as inliers to generate a transformation hypothesis for each local set. The best hypothesis is selected to perform registration. We demonstrate that the proposed inlier identification algorithm is robust under high outlier ratios and noise through experiments. Extensive results on the KITTI, 3DMatch, and 3DLoMatch datasets demonstrate that our method achieves state-of-the-art performance compared to both traditional and learning-based methods in various indoor and outdoor scenes.
RTFF: Random-to-Target Fabric Flattening Policy using Dual-Arm Manipulator
Tang, Kai, Bhattacharya, Dipankar, Xu, Hang, Tokuda, Fuyuki, Tien, Norman C., Kosuge, Kazuhiro
Robotic fabric manipulation in garment production for sewing, cutting, and ironing requires reliable flattening and alignment, yet remains challenging due to fabric deformability, effectively infinite degrees of freedom, and frequent occlusions from wrinkles, folds, and the manipulator's End-Effector (EE) and arm. To address these issues, this paper proposes the first Random-to-Target Fabric Flattening (RTFF) policy, which aligns a random wrinkled fabric state to an arbitrary wrinkle-free target state. The proposed policy adopts a hybrid Imitation Learning-Visual Servoing (IL-VS) framework, where IL learns with explicit fabric models for coarse alignment of the wrinkled fabric toward a wrinkle-free state near the target, and VS ensures fine alignment to the target. Central to this framework is a template-based mesh that offers precise target state representation, wrinkle-aware geometry prediction, and consistent vertex correspondence across RTFF manipulation steps, enabling robust manipulation and seamless IL-VS switching. Leveraging the power of mesh, a novel IL solution for RTFF-Mesh Action Chunking Transformer (MACT)-is then proposed by conditioning the mesh information into a Transformer-based policy. The RTFF policy is validated on a real dual-arm tele-operation system, showing zero-shot alignment to different targets, high accuracy, and strong generalization across fabrics and scales. Project website: https://kaitang98.github.io/RTFF_Policy/
Leveraging Perfect Multimodal Alignment and Gaussian Assumptions for Cross-modal Transfer
Multimodal alignment aims to construct a joint latent vector space where two modalities representing the same concept map to the same vector. We formulate this as an inverse problem and show that under certain conditions perfect alignment can be achieved. We then address a specific application of alignment referred to as cross-modal transfer. Unsupervised cross-modal transfer aims to leverage a model trained with one modality to perform inference on another modality, without any labeled fine-tuning on the new modality. Assuming that semantic classes are represented as a mixture of Gaussians in the latent space, we show how cross-modal transfer can be performed by projecting the data points from the representation space on to difference subspaces representing each modality. Our experiments on synthetic multimodal Gaussian data verify the effectiveness of our perfect alignment and cross-modal transfer method. We hope these findings inspire further exploration of the applications of perfect alignment and the use of Gaussian models for cross-modal learning.
Memory-updated-based Framework for 100% Reliable Flexible Flat Cables Insertion
Ling, Zhengrong, Yang, Xiong, Guo, Dong, Chang, Hongyuan, Zhang, Tieshan, Zhang, Ruijia, Shen, Yajing
Automatic assembly lines have increasingly replaced human labor in various tasks; however, the automation of Flexible Flat Cable (FFC) insertion remains unrealized due to its high requirement for effective feedback and dynamic operation, limiting approximately 11% of global industrial capacity. Despite lots of approaches, like vision-based tactile sensors and reinforcement learning, having been proposed, the implementation of human-like high-reliable insertion (i.e., with a 100% success rate in completed insertion) remains a big challenge. Drawing inspiration from human behavior in FFC insertion, which involves sensing three-dimensional forces, translating them into physical concepts, and continuously improving estimates, we propose a novel framework. This framework includes a sensing module for collecting three-dimensional tactile data, a perception module for interpreting this data into meaningful physical signals, and a memory module based on Bayesian theory for reliability estimation and control. This strategy enables the robot to accurately assess its physical state and generate reliable status estimations and corrective actions. Experimental results demonstrate that the robot using this framework can detect alignment errors of 0.5 mm with an accuracy of 97.92% and then achieve a 100% success rate in all completed tests after a few iterations. This work addresses the challenges of unreliable perception and control in complex insertion tasks, highlighting the path toward the development of fully automated production lines.